HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

نویسندگان

  • Jinquan Dai
  • Jie Huang
  • Shengsheng Huang
  • Bo Huang
  • Yan Liu
چکیده

Although Big Data Cloud (e.g., MapReduce, Hadoop and Dryad) makes it easy to develop and run highly scalable applications, efficient provisioning and finetuning of these massively distributed systems remain a major challenge. In this paper, we describe a general approach to help address this challenge, based on distributed instrumentations and dataflow-driven performance analysis. Based on this approach, we have implemented HiTune, a scalable, lightweight and extensible performance analyzer for Hadoop. We report our experience on how HiTune helps users to efficiently conduct Hadoop performance analysis and tuning, demonstrating the benefits of dataflow-based analysis and the limitations of existing approaches (e.g., system statistics, Hadoop logs and metrics, and traditional

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Kira: Processing Astronomy Imagery Using Big Data Technology

Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, HPC tools are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark—a modern platform for data intensive computing—to parallelize many-task applications. We...

متن کامل

Optimized Data Analysis Model in Cloud using Big Data Analytic Techniques

Due to the continuous increase in the volume and detail of data captured by an organization, there is an overwhelming flow of data in either structured or unstructured format. Data creation is occurring at a record rate, referred to as big data, and has emerged as a widely recognized trend. The advancements in data storage and mining technologies allow for the preservation of increasing amounts...

متن کامل

Distributed Scheduling of Event Analytics across Edge and Cloud

Internet of Things (IoT) domains generate large volumes of high velocity event streams from sensors, which need to be analyzed with low latency to drive decisions. Complex Event Processing (CEP) is a Big Data technique to enable such analytics, and is traditionally performed on Cloud Virtual Machines (VM). Leveraging captive IoT edge resources in combination with Cloud VMs can offer better perf...

متن کامل

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011